How to Reduce Dimensionality of Data: Robustness Point of View
نویسنده
چکیده
Data analysis in management applications often requires to handle data with a large number of variables. Therefore, dimensionality reduction represents a common and important step in the analysis of multivariate data by methods of both statistics and data mining. This paper gives an overview of robust dimensionality procedures, which are resistant against the presence of outlying measurements. A simulation study represents the main contribution of the paper. It compares various standard and robust dimensionality procedures in combination with standard and robust methods of classification analysis. While standard methods turn out not to perform too badly on data which are only slightly contaminated by outliers, we give practical recommendations concerning the choice of a suitable robust dimensionality reduction method for highly contaminated data. Namely the highly robust principal component analysis based on the projection pursuit approach turns out to yield the most satisfactory results over four different simulation studies. At the same time, we give recommendations on the choice of a suitable robust classification method.
منابع مشابه
A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملEvaluation of Professional Ethics Observance in Nursing Practice from Nurses and Patients’ Point of View in Shahid Beheshti University of Medical Sciences’ Teaching Hospitals
Introduction: Nursing is inherently a moral and ethical practice and health care quality is largely contingent upon how nurses fulfill their duties. Therefore, this study evaluated professional ethics observance in nursing practice from viewpoints of nurses and patients in teaching hospitals of Shahid Beheshti University of Medical Sciences in 2014. Methods: This descriptive cross-sectional stu...
متن کاملA Geometric View of Similarity Measures in Data Mining
The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...
متن کاملStudy the Life Skills of 11-19 year old Children affected by Thalassemia referring to Educational and Remedial Centers in Rasht city from their Mothers’ Point of View 2009-2010
Introduction: Children who are affected by chronic diseases such as thalassemia have more mental and social problems in compare with healthy people. Adopting to such conditions needs awareness of the ways to overcome these problems. Gaining life skill together with knowledge and science and appropriate change of attitudes, values and reinforcement of appropriate behaviors lead to normal behavio...
متن کاملHyperspectral Image Classification Based on the Fusion of the Features Generated by Sparse Representation Methods, Linear and Non-linear Transformations
The ability of recording the high resolution spectral signature of earth surface would be the most important feature of hyperspectral sensors. On the other hand, classification of hyperspectral imagery is known as one of the methods to extracting information from these remote sensing data sources. Despite the high potential of hyperspectral images in the information content point of view, there...
متن کامل